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Abstract. Multiple Instance Learning (MIL) recently provides an ap¬ 
pealing way to alleviate the drifting problem in visual tracking. Follow¬ 
ing the tracking-by-detection framework, an online MILBoost approach 
is developed that sequentially chooses weak classifiers by maximizing the 
bag likelihood. In this paper, we extend this idea towards incorporating 
the instance significance estimation into the online MILBoost framework. 
First, instead of treating all instances equally, with each instance we asso¬ 
ciate a significance-coefficient that represents its contribution to the bag 
likelihood. The coefficients are estimated by a simple Bayesian formula 
that jointly considers the predictions from several standard MILBoost 
classifiers. Next, we follow the online boosting framework, and propose 
a new criterion for the selection of weak classifiers. Experiments with 
challenging public datasets show that the proposed method outperforms 
both existing MIL based and boosting based trackers. 


1 Introduction 


Tracking-by-detection has emerged as a leading approach for accurate and ro¬ 
bust visual tracking 14 . This is primarily because it treats tracking 


as a detection problem, thereby avoiding modeling object dynamics especially 
in the presence of abrupt motions 15 and occlusions [^. Tracking-by-detection 
typically involves training a classifier to detect the target in individual frames. 
Once an initial detector is learned in the first frame, the detector will progres¬ 
sively evolve to account for the appearance variations in both the target and its 
surroundings. 

It is well known that accurate selection of training samples for the detector 
updating is rather significant for a successful tracking-by-detection method. One 
common approach for this is to take the current tracking location as one positive 
example, and use the samples collected around the location for negatives. While 
this simple approach works well in some cases, the positive example used for de¬ 
tector updating may not be optimal if the tracking location is slightly inaccurate. 
Over time this will degrade the performance of the tracker. In contrast, many 
methods [2]|6j|^ use multiple positive examples for updating, where the examples 
are sampled from a small neighborhood around the current object location. 

In principle, the latter updating scheme should be better because it exploits 
much more information. However, as reported in existing literature, it may con¬ 
fuse the appearance model since the label information about the positives is 
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not precise. Therefore, it may cause difficulties in finding an accurate decision 
boundary. Consequently, a suitable algorithm needs to handle such sort of ambi¬ 
guities in training data, especially in the positive ones. Multiple Instance Learn¬ 
ing (MIL) can be exploited to achieve this goal, since it allows for a weaker 
form of supervision to learn with instance label uncertainty. For example, re¬ 
cent advances in object detection demonstrate that MIL is able to largely 

improve the detection performance. Inspired by these applications, Babenko et 
al. propose an online MILBoost approach to address the ambiguity problem 


in visual tracking. Along with this thread, Zhang et al. 13 propose an online 


weighted MIL tracker, and Bae et al. introduce structural appearance repre¬ 
sentation into the MIL based tracking framework. In general, MIL enables these 
approaches to deal with slight appearance variations of the target during track¬ 
ing, in which case, most instances in the positive bag are relatively close to a 
true positive. However, the trackers may fail in case of strong ambiguity, e.g., 
motion blur, pose change, etc. 

To address this gap, in this work, we follow the online boosting framework in 

and propose a novel formulation of MILBoost for visual tracking. The central 
idea behind our approach is learning the significance of instances, which we call 
significance-coefficients, and incorporating them into the bag likelihood to guide 
the selection of weak classifiers in boosting. In particular, we begin by building 
a group of randomized MILBoost learners, and each provides its estimates for 
the instances being positive. Assuming that the learners are independent, we 
show that the significance-coefficients can be easily estimated through a simple 
Bayesian formulation. Further, we introduce a variant of bag likelihood function 
based upon the significance-coefficients for the selection of weak classifiers. 


2 Proposed Approach 

In the following, we first review the standard online multiple instance boosting 
method for tracking and analyze its underlying limitations. This analysis mo¬ 
tivates then our new extension, which allows for an accurate appearance model 
able to cope with diverse complex tracking scenarios. 


2.1 Online Multiple Instance Boosting 

Recently, Babenko et al. propose a novel online boosting algorithm for MIL 
(online MILBoost) to address the example selection problem for adaptive ap¬ 
pearance model updating in tracking. In particular, given a training data set 
{{Xi,yi),...,{Xn,yn)} in current frame, where a bag Xi = {xii,...,Xim} 
and yi G {0,1} is its label, as well as a pool of M candidate weak classifiers 
^ = {hi,h 2 ,..., MILBoost sequentially chooses K weak classifiers from 
the candidate pool based upon the following criterion: 

hk = argmax£(H/c-i + h) 
hen 


( 1 ) 
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where C = ^^{yi^ogpi + (1 — log(l — Pi)) is the log-likelihood over bags, 
and H/c_i = strong classifier consists of the first k — 1 weak 

classifiers. Note that jC is the bag likelihood rather than instance likelihood used 
in traditional supervised learning approaches, and pi indicates the probability 
of bag i being positive, which is defined by the Noisy-OR model: 


Pi = p{yi\Xi) = 1 - ]3(i - p{yi\xij)) 


( 2 ) 


and p{yi\xij) = is the instance probability where a{x) = is the 

sigmoid function. 

Note that the Noisy-OR model in Eq. which is used to account for am¬ 
biguity, holds an assumption that all instances in a bag contribute equally to 
the bag likelihood. It is imprecise because according to the MIL formulation, 
a positive bag contains at least one positive instance, but it may also contain 
many negative ones. Clearly, the model in Eq. [^cannot identify the true positives 
in the positive bags. While 13 mitigates this problem using a weighted MIL- 


Boost method, we observe that slight inaccuracies in tracking results will lead to 
inaccurate weights, thereby degrading the tracking performance. Eurthermore, 
not only is the likelihood model too restrictive, but also one single MILBoost 
is not flexible enough for capturing the multi-modal distribution of the target 
appearance. 


2.2 Significance-Coefficients Estimation 

The previous analysis motivates our extension of standard MILBoost to a more 
robust model so that it can handle various challenging situations. Here we aim 
to integrate the instance significance into the learning procedure. Note that our 
method is essentially different from 13 because we in this work determine the 


instance significance discriminately rather than simply weighting the instances 
according to Euclidean distances between the instances and the object location. 
In particular, we begin with training N learners: 


{Hi,..., Hat} (3) 

where denotes a randomized MILBoost classifier learned in Sec. |2.1[ and 
the randomization is obtained by sampling different negative examples for each 
learner. Then, for each instance Xij, its significance-coefficient Vij is jointly de¬ 
termined by the predictions of the learners: 

rij =p{yij\'H.i,..-,tlN) (4) 

where yij denotes the label of Xij. Assuming that the randomized MILBoost 
classifiers are conditional independent, we can rewrite the above formulation as: 


rij oc p{Hi,... ,HN\yij)p{yij) (5) 

N 

^pivij) X\p(^k\yi3) 

/c=l 


( 6 ) 
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Note that we also have p{Hk\yij) = ; l^hen the above formulation 

is equivalent to: 


N 

nj oc p{yij) JJ 

/c=l 


p{yij |H/c) 

PiVij) 


(7) 


where p{yij) is the prior indicating the probability that Xij is positive, z.e., 
yij = 1, and p{yij\'H.k) = o-(H.k{xij)) is the prediction of over instance Xij. 

Eq. [^has two characteristics in computing the significance-coefficients: 1) if 
the predicted probability p{yij |H/c) is larger than the prior p{yij), the significance 
of Xij will be enhanced; 2) considering the multiplicative part Yli.=iP{yij\^k)^ 
each predicted value can be viewed as imposing a weight to other predictions. 
This intuitively benefits the significance estimation procedure. 

Given the significance-coefficients of all instances in a positive bag, we follow 
the underlying philosophy of MIL to estimate the bag significance: 


Vi = maxr^j (8) 

3 

It should be noted that in MIL, ambiguity only exists in the positive bags. Hence, 
we only estimate the significance-coefficients for instances in the positive bags, 
but fix the significance of negative instances to = 1, thus = 1. 


2.3 Refinement of Online MILBoost 

As introduced before, the Noisy-OR model is not precise because it does not 
take the instance significance into account. In this work, we extend the Noisy- 
OR model in Eq. to the following: 

Pi =p{yi\Xi) = 1 - ]4(i -p{yi\xij)T^ (9) 

3 

The novel exponent term enables us to integrate the instance significance into 
Eq. In particular, the instance xij is equivalent to repeat times in the 
bags, and o is a constant that denotes the possible maximal repetition number 
for the instances. In fact, in our experiments, we set a = 1 for the negative bags 
so that Eq. is equivalent to Eq. and empirically set a = 3 for the positive 
bags to incorporate instance significance. 

Next, we develop an extended log-likelihood function over the bags as: 

= E logfe) + (1 - yi) log(l - Pi)) (10) 

Given the new log-likelihood function, we train a boosted classifier of weak learn¬ 
ers as in 1^: 

hk = argmax£e(He,/c-i + h) (11) 

hen 

This is similar to the procedure in Eq. except that we use a novel likelihood 
function jC^ instead of jC for weak classifier selection. Einally, we obtain a strong 
classifier He used as our discriminant appearance model. 
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2.4 Weak Classifiers 


In this work, each object bounding box is represented using a set of Haar-like 
features 10 . Each feature consists of 2 to 4 rectangles, and each rectangle has 


a real-valued weight. Thus, the feature value is a weighted sum of the pixels in 
all the rectangles. 

For each Haar-like feature //c, we associate it with a weak classifier hk with 
four parameters (/ii, ai,/io, cro)- 


hk(x) 


Ptjy = Mfkjx)) 
Ptiy = o\fk{x)) 


( 12 ) 


where Pt{Mx)\y = 1) r\j A/'(/ii,cri) and similarly for ^ = 0. Note that the above 
equation establishes with a uniform prior assumption, z.e., p{y = 1 ) = p{y = 0 ). 
Following [^, we update all the weak classifiers in parallel when new examples 
..., (Xn^yn)} are passed in: 


Ml ^ 7 Mi + (1 - 7)- 
n 

i\yi=^ 

(Tl ^ 7(Ti + (1 - 7) 

^ F ifkiXi) - Pl)^) 


(13) 

(14) 


where 7 G [0,1] is the learning rate. The update rules for po and ao are similarly 
defined. Note that our randomized MILBoost learners ^ and the new classifier 
He share the pool of candidate weak classifiers, as well as the updating rules. 


2.5 Tracking Algorithm 

In this section, we summarize our tracking algorithm. Without loss of generality, 
we assume the object location at time t — 1 is given. 1) We first crop out 
some image patches = {x :\\ i{x) — if-i IK 7} positive instances, and 

other ones = {x : 7 <|| £{x) — ||< /3} as the negative instances, where 

£{x) denotes the location of patch x , 7 and (3 are two scalar radius (measured 
in pixels). 2) Given the training examples, we learn a group of randomized MIL¬ 
Boost classifiers ^ as well as an improved MILBoost classifier He. 3) At time t, 
we crop out a set of image patches X^ = {x :\\ £{x) — £l_i ||< s} where 5 is a 
small search radius. 4) The object location £l is ultimately obtained by: 

£t=£ ( argmaxp(^|x) ) (15) 

V J 

where p(^|x) = cr(He(x)) is the appearance model. For other frames, our tracker 
repeats the above procedure to capture the object locations. 






6 



---OAB ----SBT ----WMIL ----mil Ours 


Fig. 1. Illustration results of the Boy sequence. (Best viewed in color) 



---OAB ----SBT ----WMIL ---- MIL Ours 


Fig. 2. Illustration results of the Dog sequence. (Best viewed in color) 


3 Experiments 


To evaluate the performance of the proposed algorithm thoroughly, we perform 
experiments on nine publicly available sequences with different challenging prop¬ 
erties. The total number of frames we tested is more than 9000. We compare 
the method against other 4 state-of-the-art algorithms: MIL |^, WMIL 13 , 
OAB , and SBT . For fair comparison, we run the source codes provided by 
the authors with tuned parameters to obtain their best performance. 

Our tracker is implemented in MATLAB and runs at 15 frames per second 
on a 2.93GHz Intel Core i7 CPU. In the experiments, the search radius s is set 
to 25 pixels, and the scalars 7 and (3 are set to 4 and 50 respectively. For the 
negative image patches, we randomly select 200 patches from . Then, ^ and 
He are online updated using only 50 of 200 negative patches. The number of 
randomized MILBoost classifiers is set to || ^ ||= 3, and the learning rate in 
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Table 1. Average Center Location Error (in pixel). Top two results are shown in Red 
and Blue fonts. 


Seq 

OAB 

SBT 

WMIL 

MIL 

Ours 

Boy 

6.8 

7.3 

58.1 

30.3 

5.4 

Dog 

19.9 

27.8 

13.3 

25.5 

10.2 

Doll 

19.6 

12.1 

41.5 

30.2 

8.3 

Dollar 

38.2 

77.6 

35.1 

74.3 

9.3 

Girl 

25.0 

18.0 

54.4 

38.8 

20.2 

Panda 

8.2 

7.2 

6.3 

7.8 

6.7 

Sylv 

18.7 

17.0 

19.9 

44.7 

14.5 

Twinings 

33.9 

19.7 

21.7 

20.5 

18.3 

Walking 

5.2 

5.3 

11.9 

6.6 

5.0 

Average 

19.5 

21.3 

29.1 

31.0 

10.9 


Table 2. Average Overlap Rate. Top two results are shown in Red and Blue. 


Seq 

OAB 

SBT 

WMIL 

MIL 

Ours 

Boy 

0.67 

0.53 

0.43 

0.48 

0.78 

Dog 

0.40 

0.39 

0.45 

0.47 

0.48 

Doll 

0.59 

0.64 

0.39 

0.34 

0.75 

Dollar 

0.61 

0.25 

0.63 

0.29 

0.73 

Girl 

0.54 

0.71 

0.44 

0.41 

0.62 

Panda 

0.73 

0.80 

0.71 

0.76 

0.79 

Sylv 

0.65 

0.64 

0.60 

0.43 

0.70 

Twinings 

0.54 

0.81 

0.55 

0.57 

0.81 

Walking 

0.71 

0.70 

0.51 

0.64 

0.74 

Average 

0.60 

0.61 

0.52 

0.49 

0.71 


Eq. [13] and Eq. [^is fixed to 7 = 0.85. Finally, the number of weak classifiers 
M is set to 150, and each time iC = 15 classifiers are chosen to form a strong 
classifier. 


3.1 Quantitative Evaluation 

We employ two widely used evaluation criteria to evaluate the performance of 
the trackers: 1) Center Loeation Error(CLE) which measures the position errors 
between central locations of the tracking results and the centers of the ground 
truth; 2 ) VOC Overlap Rate (VOR) that evaluates the success ratio of the al¬ 
gorithms, which is calculated by VOR = where Br denotes the tracked 

bounding box, Bg is the ground truth box, and | • | denotes the number of pixels 
in a region. Tab. and Tab. respectively summarize the average CLEs and 
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---K)AB ----SBT ----WMIL ----MIL Ours 


Fig. 3. Illustration results of the Doll sequence. (Best viewed in color) 



---OAB ----SBT ----WMIL ----MIL Ours 


Fig. 4. Illustration results of the Dollar sequence. (Best viewed in color) 


the average VORs of the compared trackers on the nine videos. The potential 
benefits of our tracker are notable: it performs best on 7 of 9 videos in terms of 
the average CLEs as well as the average VORs. Compared with MIL and WMIL, 
the performance improvement is particularly impressive in Boy^ Doll and Walk¬ 
ing sequences. As discussed in 11 , these sequences, which contain motion blur 
and low-resolution target, are highly challenging for previous MIL based track¬ 
ers. In our study, the multiple randomized classifiers enable us to capture the 
complex multi-modal distribution of the target appearance. Furthermore, our 
bag likelihood function is more accurate than the previous algorithms. Hence, 
our algorithm can better handle these challenges. 
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Fig. 5. Illustration results of the Girl sequence. (Best viewed in color) 



Fig. 6. Illustration results of the Panda sequence. (Best viewed in color) 


3.2 Qualitative Evaluation 

In this section, we qualitatively compare our method with other trackers in 
dealing with various challenging factors. 

Fast Motion: We firstly evaluate these trackers on two sequences with fast 
motion, which are Boy and Doll. These sequences are challenging because fast 
movement may result in blurred object appearance that is difficult to handle 
in object tracking. As shown in Fig. MIL and WMIL fail to track the target 
before frame #94 in Boy in which the target appearance undergoes significant 
change, and OAB and SBT also locate the target inaccuratly in frame #227 
and #541. Our method can track the sequences successfully with a small error 
mainly because of the more accurate likelihood function. 

For the Doll sequence, object appearance changes drastically as the target 
moving back and forth. As illustrated in Fig. OAB easily loses the target at 
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the beginning of the sequences (e.^., #343). Subsequently, SBT, WMIL, and 
MIL also does not deal with the motion blur well when the target undergoes 
fast motion #3548). Overall, our algorithm can accurately estimate the 

location of the target throughout the sequence. 

Pose Change: We next evaluate our method in dealing with larget pose 
change over four challenging sequences, z.e., Sylv^ Girl^ Dog and Twinings. In 
both Sylv and Dog sequences, the targets suffer from great pose change. As 
show in Fig. and Fig. the previous multiple instance learing based trackers, 
z.e., MIL and WMIL fail in these situations since the likelihood of the target is 
not accurately estimated. After a long-term tracking, the two trackers generally 
lose the target (e.^., #884, #1171, and #1273 in Sylv, #1240 in Dog, #70, 
#165, #188, #224). In contrast, OAB, SBT and our tracker can well handle the 
appearance change caused by pose change and give better results. 

For the Girl and Twinings sequences, the objects suffer from out-of-plane 
rotations as well as heavy occlusion. The WMIL algorithm performs worst in 
these two sequences, as shown in Fig. and Fig. OAB, SBT and our tracker 
are able to track the target in the two sequences. 

Other Challenges: As shown in Fig. and Fig. the Panda and Walking 
sequences show that our method copes well with the situations where the tar¬ 
get is actually of low-resolution, primarily because our method can select more 
discriminative features in the boosting stage than the previous approaches. 

For the Dollar sequence, there is a distractor which may result in the failure 
of the trackers. As presented in Fig.[^ the MIL and the SBT trackers easily drift 
from the target due to the object with similar appearance. The OAB, WMIL and 
our tracker perform well in this challenging situtation. Besides, our algorithms 
gives more accurate tracking results than the two methods, as illustrated in 
Tab. [T] and Tab. [2j 

Finally, it’s revealed in Dog and Doll sequences that our tracker is more 
stable than other methods during long-term tracking, owing that by incorporat¬ 
ing significance-coefficients of instances, our MIL method can well handle the 
ambiguity when updating the appearance model, as shown in Fig. and Fig. 


4 Conclusion 


Inspired from the recent success of multiple instance learning (MIL) in tracking, 
we proposed a novel algorithm that incorporates the significance-coefficients of 
instances into the online MILBoost framework. Our approach consists of two 
steps: (i) significance-coefficients estimation via a Bayesian formulation based on 
the predictions given by the randomized MILBoost classifiers, and (ii) a flexible 
scheme for incorporating the instance significance into the objective function of 
online MILBoost. In the experiments, we evaluate our method on several publicly 
available datasets and the results show its better performance. 


11 



Fig. 7. Illustration results of the Sylv sequence. (Best viewed in color) 



#188 . . #224 #390 1 



---MDAB ----SBT ----WMIL ----mil Ours 


Fig. 8. Illustration results of the Twinings sequence. (Best viewed in color) 
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